AITopics | pose estimation

Collaborating Authors

pose estimation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

fc8ee7c7ab5b5f6b1615045dfb617ed6-Paper-Conference.pdf

Neural Information Processing SystemsJun-23-2026, 04:09:48 GMT

Indoor environments are the primary setting where humans spend most of their daily lives. Yet, computationally creating digital twins of these 3D spaces from captured images remains challenging. Factors such as the difficulty of accurate camera pose estimation from indoor images [28, 11, 1] and structural distortions in the resulting 3D reconstructions [22, 12, 21] hinder the development of robust, accurate, and user-friendly solutions for replicating indoor scenes in the digital world. As indoor scenes are typically rich in planar structures such as floors, ceilings, and walls, as well as planar furniture like tables and cabinets, planar primitives are well-suited representations for the accurate 3D reconstruction of indoor scenes. As a result, there has been significant interest among the research community in planar 3D reconstruction in recent years. Planar reconstruction approaches include feedforward solutions in monocular [40, 16, 27, 24, 18, 42] and two-view [11, 1, 28] settings, and per-scene optimization approaches [29, 38, 3, 9] that leverage posed multi-view inputs with the assistance of the feedforward models were studied. However, these approaches face two key limitations: Annotation dependence for feedforward methods: Learning feedforward models [36, 24, 28] typically requires accurate plane masks and 3D plane annotations from monocular or binocular inputs.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

Radar based Estimation using Transformer

Neural Information Processing SystemsJun-23-2026, 03:57:35 GMT

Radar-based indoor 3D human pose estimation typically relied on fine-grained 3D keypoint labels, which are costly to obtain especially in complex indoor settings involving clutter, occlusions, or multiple people. In this paper, we propose RAPTR (RAdar Pose esTimation using tRansformer) under weak supervision, using only 3DBBox and 2D keypoint labels which are considerably easier and more scalable to collect. Our RAPTR is characterized by a two-stage pose decoder architecture with a pseudo-3D deformable attention to enhance (pose/joint) queries with multi-view radar features: a pose decoder estimates initial 3D poses with a 3D template loss designed to utilize the 3DBBox labels and mitigate depth ambiguities; and a joint decoder refines the initial poses with 2D keypoint labels and a 3D gravity loss. Evaluated on two indoor radar datasets, RAPTR outperforms existing methods, reducing joint position error by 34.3% on HIBER and 76.9% on MMVR.

artificial intelligence, machine learning, pose estimation, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

MoPo-Fr123121Dyee namsneo4D cuicla GContraus Vidsreolioan PoiSplntats ting

Neural Information Processing SystemsJun-23-2026, 03:29:33 GMT

Novel view synthesis from monocular videos of dynamic scenes with unknown While camera recent poses remains advances a in fundamental 3D representations challenge such in computer as Neural vision Radiance and graphics. Fields (NeRF) scenes, and they 3D struggle Gaussian with Splatting dynamic (3DGS) content ha and ve sho typically wn promising rely on results pre-computed for static camera poses. We present 4D3R, a pose-free dynamic neural rendering framework that Our method decouples first static leverages and dynamic 3D foundational components models through for initial a tw pose o-stage and approach.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

BevSplat: Resolving Height Ambiguity via Feature-Based Gaussian Primitives for Weakly-Supervised Cross-View Localization

Neural Information Processing SystemsJun-23-2026, 00:36:07 GMT

This paper addresses the problem of weakly supervised cross-view localization, where the goal is to estimate the pose of a ground camera relative to a satellite image with noisy ground truth annotations. A common approach to bridge the cross-view domain gap for pose estimation is Bird's-Eye View (BEV) synthesis.

artificial intelligence, name change, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.61)

Add feedback

Pose Splatter: A3DGaussian Splatting Model for Quantifying Animal Pose and Appearance

Neural Information Processing SystemsJun-22-2026, 23:16:07 GMT

Accurate and scalable quantification of animal pose and appearance is crucial for studying behavior. Current 3D pose estimation techniques, such as keypoint-and mesh-based techniques, often face challenges including limited representational detail, labor-intensive annotation requirements, and expensive per-frame optimization. These limitations hinder the study of subtle movements and can make large-scale analyses impractical. We propose Pose Splatter, a novel framework leveraging shape carving and 3DGaussian splatting to model the complete pose and appearance of laboratory animals without prior knowledge of animal geometry, per-frame optimization, or manual annotations. We also propose a rotation-invariant visual embedding technique for encoding pose and appearance, designed to be a plug-in replacement for 3D keypoint data in downstream behavioral analyses. Experiments on datasets of mice, rats, and zebra finches show Pose Splatter learns accurate 3D animal geometries. Notably, Pose Splatter represents subtle variations in pose, provides better low-dimensional pose embeddings over state-of-the-art as evaluated by humans, and generalizes to unseen data. By eliminating annotation and per-frame optimization bottlenecks, Pose Splatter enables analysis of large-scale, longitudinal behavior needed to map genotype, neural activity, and behavior at high resolutions.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Orientation Matters: Making 3DGenerative Models Orientation-Aligned

Neural Information Processing SystemsJun-22-2026, 22:34:01 GMT

Humans intuitively perceive object shape and orientation from a single image, guided by strong priors about canonical poses. However, existing 3D generative models often produce misaligned results due to inconsistent training data, limiting their usability in downstream tasks. To address this gap, we introduce the task of orientation-aligned 3D object generation: producing 3D objects from single images with consistent orientations across categories. To facilitate this, we construct Objaverse-OA, a dataset of 14,832 orientation-aligned 3D models spanning 1,008 categories. Leveraging Objaverse-OA, we fine-tune two representative 3D generative models based on multi-view diffusion and 3D variational autoencoder frameworks to produce aligned objects that generalize well to unseen objects across various categories. Experimental results demonstrate the superiority of our method over post-hoc alignment approaches. Furthermore, we showcase downstream applications enabled by our aligned object generation, including zero-shot object orientation estimation via analysis-by-synthesis and efficient arrow-based object rotation manipulation.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

bd20ff18345f0ded89242bf9ef58e46c-Paper-Position_Paper_Track.pdf

Neural Information Processing SystemsJun-22-2026, 11:56:07 GMT

This position paper argues that human pose estimation (HPE) cannot be considered privacy-preserving or human-centric unless privacy is measured and evaluated. Although privacy concerns have become more visible in recent years, HPE systems are still assessed almost exclusively using accuracy metrics. Privacy is neither defined in measurable terms nor linked to regulatory requirements, and common deployment architectures introduce additional risks due to data transmission and storage. We highlight the limitations of current practices, including the continued reliance on RGB inputs and the lack of benchmarks that reflect legal and ethical constraints. We call for a shift in evaluation practices: privacy must become part of how HPE systems are designed, tested, and compared.

artificial intelligence, machine learning, pose estimation, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.87)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Communications (1.00)
(2 more...)

Add feedback

HMVLM: Human Motion-Vision-Lanuage Model via MoELoRA

Neural Information Processing SystemsJun-19-2026, 14:53:10 GMT

The expansion of instruction-tuning data has enabled foundation language models to exhibit improved instruction adherence and superior performance across diverse downstream tasks. Semantically-rich 3D human motion is being progressively integrated with these foundation models to enhance multimodal understanding and cross-modal generation capabilities. However, the modality gap between human motion and text raises unresolved concerns about catastrophic forgetting during this integration. In addition, developing autoregressive-compatible pose representations that preserve generalizability across heterogeneous downstream tasks remains a critical technical barrier. To address these issues, we propose the Human MotionVision-Language Model (HMVLM), a unified framework based on the Mixture of Expert Low-Rank Adaption(MoE LoRA) strategy. The framework leverages the gating network to dynamically allocate LoRA expert weights based on the input prompt, enabling synchronized fine-tuning of multiple tasks. To mitigate catastrophic forgetting during instruction-tuning, we introduce a novel zero expert that preserves the pre-trained parameters for general linguistic tasks. For pose representation, we implement body-part-specific tokenization by partitioning the human body into different joint groups, enhancing the spatial resolution of the representation. Experiments show that our method effectively alleviates knowledge forgetting during instruction-tuning and achieves remarkable performance across diverse human motion downstream tasks.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: Asia (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

PandaPose: 3DHuman Pose Lifting from a Single Image via Propagating 2DPose Prior to 3DAnchor Space

Neural Information Processing SystemsJun-19-2026, 03:37:16 GMT

Existing methods typically establish a direct joint-to-joint mapping from 2D to 3D poses based on 2D features. This formulation suffers from two fundamental limitations: inevitable error propagation from input predicted 2D pose to 3D predictions and inherent difficulties in handling self-occlusion cases. In this paper, we propose PandaPose, a 3D human pose lifting approach via propagating 2D pose prior to 3D anchor space as the unified intermediate representation. Specifically, our 3D anchor space comprises: (1) Joint-wise 3D anchors in the canonical coordinate system, providing accurate and robust priors to mitigate 2D pose estimation inaccuracies.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology: